Web mining is the application of data mining techniques to discover patterns from the web. The web mining may be divided into a web usage mining, web content mining or web structure mining. The web content mining is a process to discover useful information from the content of a web page. The useful information may include text, image, audio or video data.
Text mining refers to the process of deriving high quality information from text. In general, a web search engine may be used for the text mining. The web search engine searches for information on the World Wide Web based on a search term. The search engine may return search results which may contain a part or all of the search terms. Additionally, a filter may be used to refine the search result.
However, the web search engine and/or filter may not be effective when a user is looking for data which has a particular pair-based relationship to the search term. For example, the user may be looking to obtain a lower part (e.g., a first sentence) of a Chinese couplet when he or she enters a search term containing an upper part (e.g., a second sentence) of the Chinese couplet which goes together with the lower part. In this case, the search results, which simply list any web text containing the upper part, may not be adequate. The search result may be too abundant and random, so the user may have to spend time to sort the search results to obtain some useful lower parts which can go with the upper part.